The Number of Attractors in Kauffman Networks 
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The Kauffman model describes a particularly simple class of random Boolean networks. Despite 
the simplicity of the model, it exhibits complex behavior and has been suggested as a model for real 
world network problems. We introduce a novel approach to analyzing attractors in random Boolean 
networks, and applying it to Kauffman networks we prove that the average number of attractors 
grows faster than any power law with system size. 

PACS numbers: 89.75.Hc, 02.70.Uu 
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INTRODUCTION 

We are increasingly often faced with the problem of 
modeling complex systems of interacting entities, such as 
social and economic networks, computers on the Internet, 
or protein interactions in living cells. Some properties of 
such systems can be modeled by Boolean networks. The 
appeal of these networks lies in the finite (and small) 
number of states each node can be in, and the ease with 
which we can handle the networks in a computer. 

A deterministic Boolean network has a finite number 
of states. Each state maps to one state, possibly itself. 
Thus, every network has at least one cycle or fixed point, 
and every trajectory will lead to such an attractor. The 
behavior of attractors in Boolean networks has been in- 
vestigated extensively, see e.g. 0, 0, IS El IE • F° r a 
recent review, see Q. 

A general problem when dealing with a system is find- 
ing the set of attractors. For Boolean networks with more 
than a handful of nodes, state space is too vast to be 
searched exhaustively. In some cases, a majority of the 
attractor basins are small and very hard to find by ran- 
dom sampling. One such case is the Kauffman model 
8J. Based on experience with random samplings, it has 
been commonly believed that the number of attractors 
in that model grows like the square root of syst em size. 
Lately this has been brought into question [flt llCl. 111! Il2j | . 
Using an analytic approach, we are able to prove that 
the number attractors grows faster than any power law. 
The approach is based on general probabilistic reasoning 
that may also be applied to other systems than Boolean 
networks. The derivations in the analysis section are just 
one application of the approach. We have attempted to 
focus on the method and our results, while keeping the 
mathematical details at a minimum level needed for re- 
producibility using standard mathematical methods. 

In 1969 Kauffman introduced a type of Boolean net- 
works as a model for gene regulation [8| . These networks 
are known as N-K models, since each of the N nodes 
has a fixed number of inputs K. A Kauffman network 
is synchronously updated, and the state (0 or 1) of any 
node at time step t is some function of the state of its 



input nodes at the previous time step. An assignment of 
states to all nodes is referred to as a configuration. When 
a single network, a realization, is created, the choice of 
input nodes and update functions is random, although 
the update functions are not necessarily drawn from a 
flat distribution. This reflects a null hypothesis as good 
or bad as any, if we have no prior knowledge of the details 
of the networks we wish to model. 

In this letter, we will first present our approach, and 
then apply it to Kauffman's original model, in which 
there are 2 inputs per node and the same probability for 
all of the 16 possible update rules. These 16 rules are the 
Boolean operators of two or fewer variables: and, OR, 
true, etc. This particular N-K model falls on the crit- 
ical line, where the network dynamics is neither ordered 
nor chaotic 0, 111 El- 



APPROACH 

Our basic idea is to focus on the problem of finding 
the number of cycles of a given length L in networks of 
size N. As we will see, the discreteness of time makes it 
convenient to handle cycles as higher-dimensional fixed 
point problems. Then it is possible to do the probabilis- 
tic averaging in a way which is suitable for an analytic 
treatment. This idea may also be expanded to applica- 
tions with continuous time, rendering more complicated 
but also more powerful methods. 

We use four key assumptions: (i) the rules are cho- 
sen independently of each other and of N, (il) the input 
nodes are independently and uniformly chosen from all N 
nodes, (in) the dynamics is dominated by stable nodes, 
and (iv) the distribution of rules is invariant due to inver- 
sion of any set of inputs, (iv) means e.g. that the fraction 
of AND and NOR gates are the same whereas the fraction 
AND and NAND gates may differ, (iv) is presumably not 
necessary, but simplifies the calculations drastically, (in) 
is expected to be valid for any non-chaotic network obey- 
ing (i) and (il) 0| . Note that (i) does not mean that the 
number of inputs must be the same for every rule. We 
could write a general treatment of all models obeying 
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(i) - (iv), but for simplicity we focus on the Kauffman 
model. 

We will henceforth use (Cl) N to denote the expecta- 
tion value of the number of L-cycles over all networks of 
size TV, with L = 1 referring to fixed points. The average 
number of fixed points, (Ci) N , is particularly simple to 
calculate. For a random choice of rules, (i) and (iv) im- 
ply that the output state of the net is independent of the 
input state. Hence, the input and output states will on 
average coincide once on enumeration of all input states. 
This means that (Ci) N = 1. 

The problem of finding other L-cycles can be trans- 
formed to a fixed point problem. Assume that a Boolean 
network performs an L-cycle. Then each node performs 
one of 2 L possible time series of output values. Consider 
what a rule does when it is subjected to such time scries 
on the inputs. It performs some boolean operation, but 
it also delays the output, giving a one-step difference in 
phase for the output time series. If we view each time 
series as a state, we have a fixed point problem. L (Cl)^ 
is then the average number of input states (time series), 
for the whole network, such that the output is the same 
as the input. 

To take advantage of assumption (iv), we introduce 
the notion of L-cycle patterns. An L-cycle pattern is s 
and s inverted, where s is a time series with period L. 
Let Q denote a choice of L-cycle patterns for the net, 
and let P(Q) denote the probability that the output of 
the net is Q. Using the same line of reasoning as for fixed 
points, we conclude that (i) and (iv) yield 

Pl) n = ~ E P (Q) (!) 
QeQ~ 

where is the set of proper L-cycles of an iV-node net. 
A proper L-cycle has no period shorter than L. 

ANALYTIC CALCULATIONS 

Assumption (n) implies that -P(Q) is invariant under 
permutations of the nodes. Let n = (no, . . . , n m -i) de- 
note the number of nodes expressing each of the to = 
2 L_1 patterns. For nj, we refer to j as the pattern in- 
dex. For convenience, let the constant pattern have index 
0. Then 




where ( ) denotes the multinomial N\/(no\ ■ ■ ■ n m -il) 
and is the set of partitions n of N such that Q € Q^. 
That is, n represents a proper L-cycle. 

What we have this far is merely a division of the prob- 
ability of an L-cycle into probabilities of different flavors 
of L-cycles. Now we assume that each node has 2 inputs. 



Then, we get a simple expression for P(Q) that inserted 
into Eq. yields 




where {Pl)\ t 2 denotes the probability that the output 
pattern of a random 2-input rule has index j, given that 
the input patterns have the indices l\ and I2 respectively. 
Note that Eq. @ is an exact expression for the average 
number of proper L-cycles in an iV-node random Boolean 
network which satisfies the assumptions (1), (11), (iv), and 
where each node has 2 inputs. 

From now on, we only consider the Kauffman model, 
meaning that we also restrict the distribution of rules to 
be uniform. It is instructive to explore some properties 
of (Pl) j 1i12 \ these will also be needed in the following 
calculations. We see that 

(P £ )8o = l, {Pl)U = 1 and ( p L)li 2 >k ( 4 ) 

for 1 < l±,l2 < m. Further, we note that for a given 
j 7^ 0, (-Pl); i0 nas a non-zero value for exactly one l\ € 
{1, . . . , m — 1}. Let 4>l{j) denote that value of l±. We 
can see ^ as a function that rotates an L-cycle pattern 
one step backwards in time. With this in mind we define 
<^l(0) = 0. Now, we can write 

( p L)i l0 = 5<W(i) ( 5 ) 

for 1 < j < m. (S is the Kronecker delta.) 

We can view 4>l as a permutation on the set 
{0, . . . , to — 1}. Thus, we divide this index space into per- 
mutation cycles which are sets of the type {j, <^l(j), 4>l 
0i(j)j • • ■}• We refer to these permutation cycles as in- 
variant sets of L-cycles. Let p° L , . . . , p^ i_1 denote the 
invariant sets of L-cycles, where Hl is the number of 
such sets. For convenience, let po be the invariant set 
{0}. If two L-cycle patterns belong to the same invariant 
set, they can be seen as the same pattern except for a 
difference in phase. 

We want to find the behavior of (Cl) n , for large N, 
by approximating Eq. @ with an integral. To do this, 
we use Stirling's formula n\ m (n j e) n \J 2im while noting 
that the boundary points where rij = for some j can be 
ignored in the integral approximation. Let Xj = rij/N for 
j = 0, . . . , to — I and integrate over x = (xi, . . . , x m _i). 
xq is implicitly set to xq = 1 — YTjLi x j- We get 

1 //V\ (m ~ 1)/2 r e^W 

Q<x ,...,x m -i J 
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where 



Jx(x) 



m— 1 

E 

3=0 



In 



x ^ / 

3 0<h,l 2 <m 



(7) 



Eq. (JZJ) can be seen as an average (InX), where X is the 
expression inside the parentheses. Hence, the concavity 
of x — > In a; gives /l(x) = (InX) < In (X) = with 
equality if and only if 



XI x h x h( p L)i ll2 

0<l 1 ,l 2 <m 



(8) 



for all j = 0, . . . , m — 1. 

Note that Eq. (JSJ can be interpreted as a mean-field 
equation of the model. Using Eq. (JSJ for j = and Eq. 
(@J we see that Jl (x) comes arbitrarily close to zero only 
in the vicinity of x = 0, and for large N, the relevant 
contributions to the integral in Eq. © come from this 
region. Thus, the dynamics of the net is dominated by 
stable nodes, in agreement with 0, [^. This means 
that assumption (ill) is satisfied by the Kauffman model. 
Using Eqs. and JHJ), a Taylor-expansion of /i(ex) 
yields 



/ L (ex) = e 




(9) 



where {A J L ) h i 



The first order term of Eq. (JHJ) has as its maximum 
and reaches this value if and only if x<p L (j) = xj for all 
j = 1, . . . ,m— 1. The second order term is zero at these 
points, while the third order term is less than zero for 
all x 7^ 0. Hence, the first and third order terms are 
governing the behavior for large N. Using the saddle- 
point approximation, we reduce the integration space to 
the space where the first and second order terms are 0. 
Let z h = N 1 / 3 V,. h xa for h = 1, . . . , H L - 1 and let 

(^L)fc 1 fe 2 denote the probability that the output pattern 
of a random rule belongs to p\, given that the input pat- 
terns are randomly chosen from p k ^ and p^ 2 respectively. 
Thus, we approximate Eq. © for large N as 

(Cl) n « a L p L N^ 



where 
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(H L ~l)/2 



exp 



fa = 



H L -1 

E 

h=l 



B h L z) 



0<zi, 



H L -1 

n 

h=l 



(10) 



(11) 



(12) 



1L 



H L -1 



(13) 



and (B<l) klk2 = {P' L ) h klk2 - \ (6 hlh + 5 h2h ). (\p\ denotes 
the number of elements of the set p.) 

Hl grows rapidly with L. The number of elements in 
an invariant set of L-cycle patterns is a divisor of L. If an 
invariant set consists of only one pattern, it is either the 
constant pattern, or the pattern with alternating zeros 
and ones. The latter is only possible if L is even. Thus, 
H L - 1 > (2 L - 1 - 1)/L, with equality if L is a prime 
number > 2. Applying this conclusion to Eqs. i|13[l and 
pup, we see that for any power law N*, we can choose 
an L such that (Cl) n grows faster than N^. 



NUMERICAL RESULTS 

We have written a set of programs to compute the 
number of L-cycles in Kauffman networks, Eq. J2J, both 
by complete enumeration and using Monte Carlo meth- 
ods, and tested them against complete enumeration of 
the networks with TV < 4. The results for 2 < L < 6 are 
shown in Fig. ^ along with the corresponding asymp- 
totes. The asymptotes were obtained by Monte Carlo 
integration of Eq. l|12f) . For low N, (Cl) n is dominated 
by the boundary points neglected in Eq. and its 

qualitative behavior is not obvious in that region. 

A straighforward way to count attractors in a network 
is to simulate trajectories from random configurations 
and count distinct target attractors. As has been pointed 
out in |ll|. this gives a biased estimate. Simulations 
in Q with up to 200 trajectories per network indicated 
\/N scaling with system size, whereas j 1 it reported linear 
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FIG. 1: The number of L-cycles as functions of the network 
size for 2 < L < 6. The numbers in the figure indicate L. 
Dotted lines are used for values obtained by Monte Carlo 
summation, with errors comparable to the line width. The 
asymptotes for N < 8 have been included as dashed lines. 
Their slopes are 72 = 73 = 5, 74 = 75 = 1, 76 = |, 77 = 3, 
and 7s = . 



4 



Cycles 




1 10 10 2 10 3 10 4 



FIG. 2: The number of observed attractors (a) and 2-cycles 
(b) per network as functions of N for different numbers of 
trajectories r: lOO(circles), 10 3 (squares) , 10 4 (diamonds), and 
10 5 (triangles). In (a), the dashed lines have slopes 0.5, 1, 
and 2. In (b) the solid line shows {C2) N , the true number of 
2-cycles. In the high end of (b), la corresponds to roughly 
20%. 

behavior with 1000 trajectories. That the true growth is 
faster than linear has now been firmly established [l^ . 

To closer examine the problem of biased undersam- 
pling, we have implemented the network reduction algo- 
rithm from [llj , and gathered statistics on networks with 
TV < 10 4 . We repeated the simulations for different num- 
bers of trajectories r, with 100 < r < 10 5 . For each N 
and r, 10 3 network realizations were examined, and we 
discarded configurations if no cycle was found within 2 13 
time steps. The results are summarized in Fig. 

For t — 100, the number of attractors follows y/N 
remarkably well, considering that r = 10 3 gives the quite 
different N behavior seen in [llj . If we extrapolate wildly 
from a log-loglog plot, e °- 3v ^ fits the data rather well. 

As another example of how severe the biased under- 
sampling is, we have included a plot of the number of 
2-cycles found in the simulations (Fig. |2t>). The un- 
derlying distribution is less uniform than for the total 
number of attractors, so the errors are larger, but not 
large enough to obscure the qualitative behavior. The 



number of observed 2-cycles is close to (C2) N for low N, 
but as N grows, a vast majority of them are overlooked. 
As expected, this problem sets in sooner for lower r, al- 
though the difference is not as marked as it is for the 
total number of attractors. 



SUMMARY 

We have introduced a novel approach to analyzing at- 
tractors of random Boolean networks. By applying it to 
Kauffman networks, we have proven that the number of 
attractors in these grows faster than any power law with 
network size N. This is in sharp contrast with the previ- 
ously cited ^/N behavior, but in agreement with recent 
findings. 

For the Kauffman model, we have derived an expres- 
sion for the asymptotic growth of the number of L-cycles, 
(Cl)jv- This expression is corroborated by statistics from 
network simulations. The simulations also demonstrate 
that biased undersampling of state space is a good ex- 
planation for the previously observed y/N behavior. 

We wish to thank Carsten Peterson, Bo Soderberg, 
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Genomics and Bioinformatics. 
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